.TH PURE-GEN 1 "2009-02-26" "Pure" "Pure Manual"
.hw name-space
.hw name-spaces
.SH NAME
pure-gen \- Pure interface generator
.SH SYNOPSIS
\fBpure-gen\fP [\fIoptions\fP ...] \fIinput-file\fP
.SH OPTIONS
.SS General Options
.TP
\fB-h\fP, \fB--help\fP
Print a brief help message and exit.
.TP
\fB-V\fP, \fB--version\fP
Print version number and exit.
.TP
\fB-e\fP, \fB--echo\fP
Echo preprocessor lines. Prints all processed \fB#define\fPs, useful for
debugging purposes.
.TP
\fB-v\fP, \fB--verbose\fP
Show parameters and progress information. Gives useful information about the
conversion process.
.TP
\fB-w\fP[\fIlevel\fP], \fB--warnings\fP[=\fIlevel\fP]
Display warnings, \fIlevel\fP = 0 (disable most warnings), 1 (default, shows
important warnings only) or 2 (lots of additional warnings useful for
debugging purposes).
.SS Preprocessor Options
.TP
\fB-I\fP \fIpath\fP, \fB--include\fP \fIpath\fP
Add include path. Passed to the C preprocessor.
.TP
\fB-D\fP \fIname\fP[=\fIvalue\fP], \fB--define\fP \fIname\fP[=\fIvalue\fP]
Define symbol. Passed to the C preprocessor.
.TP
\fB-U\fP \fIname\fP, \fB--undefine\fP \fIname\fP
Undefine symbol. Passed to the C preprocessor.
.TP
\fB-C\fP \fIoption\fP, \fB--cpp\fP \fIoption\fP
Pass through other preprocessor options and arguments.
.SS Generator Options
.TP
\fB-f\fP \fIiface\fP, \fB--interface\fP \fIiface\fP
Interface type (\(aqextern\(aq, \(aqc\(aq, \(aqffi\(aq or \(aqc-ffi\(aq).
Default is \(aqextern\(aq. The \(aqextern\(aq and \(aqc\(aq types generate
Pure \fBextern\fP declarations, which is what you want in most cases.
\(aqffi\(aq and \(aqc-ffi\(aq employ Pure's libffi interface instead. The
\(aqc\(aq and \(aqc-ffi\(aq types cause an additional C wrapper module to be
created (see \fIGENERATING C CODE\fP). These can also be combined with the
\(aq-auto\(aq suffix which creates C wrappers only when needed to get C struct
arguments and returns working, see \fIDEALING WITH C STRUCTS\fP for details.
.TP
\fB-l\fP \fIlib\fP, \fB--lib-name\fP \fIlib\fP
Add dynamic library module to be imported in the Pure output file. Default is
\fB-l\fP \fIc-file\fP (the filename specified with \fB-c\fP, see below,
without filename extension) if one of the \fB-fc\fP options was specified,
none otherwise.
.TP
\fB-m\fP \fIname\fP, \fB--namespace\fP \fIname\fP
Module namespace in which symbols should be declared.
.TP
\fB-p\fP \fIprefix\fP, \fB--prefix\fP \fIprefix\fP
Module name prefix to be removed from C symbols.
.TP
\fB-P\fP \fIprefix\fP, \fB--wrap\fP \fIprefix\fP
Prefix to be prepended to C wrapper symbols (\fB-fc\fP and friends). Default
is \(aqPure_\(aq.
.TP
\fB-a\fP, \fB--all\fP
Include ``hidden'' symbols in the output. Built-in preprocessor symbols and
symbols starting with an underscore are excluded unless this option is
specified.
.TP
\fB-s\fP \fIpattern\fP, \fB--select\fP \fIpattern\fP
Selection of C symbols to be included in the output. \fIpattern\fP takes the
form [\fIglob-patterns\fP::][\fIregex-pattern\fP], designating a comma
separated list of glob patterns matching the source filenames, and an extended
regular expression matching the symbols to be processed. See \fBglob\fP(7) and
\fBregex\fP(7). The default \fIpattern\fP is empty which matches all symbols
in all source modules.
.TP
\fB-x\fP \fIpattern\fP, \fB--exclude\fP \fIpattern\fP
Like \fB-s\fP, but \fIexcludes\fP all matching C symbols from the selection.
.TP
\fB-t\fP \fIfile\fP, \fB--template\fP \fIfile\fP
Specify a C template file to be used with C wrapper generation
(\fB-fc\fP). See \fIGENERATING C CODE\fP for details.
.TP
\fB-T\fP \fIfile\fP, \fB--alt-template\fP \fIfile\fP
Specify an alternate C template file to be used with C wrapper generation
(\fB-fc\fP). See \fIGENERATING C CODE\fP for details.
.SS Output Options
.TP
\fB-n\fP, \fB--dry-run\fP
Only parse without generating any output.
.TP
\fB-N\fP, \fB--noclobber\fP
Append output to existing files.
.TP
\fB-o\fP \fIfile\fP, \fB--output\fP \fIfile\fP
Pure output (.pure) filename. Default is \fIinput-file\fP with new
extension .pure.
.TP
\fB-c\fP \fIfile\fP, \fB--c-output\fP \fIfile\fP
C wrapper (.c) filename (\fB-fc\fP). Default is \fIinput-file\fP with new
extension .c.
.SH DESCRIPTION
.B pure-gen
generates Pure bindings for C functions from a C header file. For instance,
the command
.sp
.nf
pure-gen foo.h
.fi
.sp
creates a Pure module foo.pure with
.B extern
declarations for the constants (\fB#define\fPs and enums) and C routines
declared in the given C header file and (recursively) its includes.
.PP
.B pure-gen
only accepts a single header file on the command line. If you need to parse
more than one header in a single run, you can just create a dummy header with
all the necessary \fB#include\fPs in it and pass that to
.B pure-gen
instead.
.PP
When invoked with the
.B -n
option,
.B pure-gen
performs a dry run in which it only parses the input without actually
generating any output files. This is useful for checking the input (possibly
in combination with the \fB-e\fP, \fB-v\fP and/or \fB-w\fP options) before
generating output. A particularly useful example is
.sp
.nf
pure-gen -ne foo.h \e
  | awk '$1=="#" && $2~/^[0-9]+$/ && $3!~/^"<.*>"$/  { print $3 }' \e
  | sort | uniq
.fi
.sp
which prints on standard output all headers which are included in the source.
This helps to decide which headers you want to be included in the output, so
that you can set up a corresponding filter patterns (\fB-s\fP and \fB-x\fP
options, see below).
.PP
The \fB-I\fP, \fB-D\fP and \fB-U\fP options are simply passed to the C
preprocessor, as well as any other option or argument escaped with the
\fB-C\fP flag. This is handy if you need to define additional preprocessor
symbols, add directories to the include search path, etc., see
.BR cpp (1)
for details.
.PP
There are some other options which affect the generated output. In particular,
\fB-f\ c\fP generates a C wrapper module along with the Pure module (see
\fIGENERATING C CODE\fP below), and \fB-f\ ffi\fP generates a wrapper using
Pure's ffi module. Moreover, \fB-l\ libfoo\fP generates a \(aq\fBusing\fP
\(dqlib:libfoo\(dq\(aq declaration in the Pure source, for modules which
require a shared library to be loaded. Any number of \fB-l\fP options can be
specified.
.PP
Other options for more advanced uses are explained in the following sections.
.SH FILTERING
Note that
.B pure-gen
always parses the given header file as well as \fIall\fP its includes. If the
header file includes system headers, by default you will get those
declarations as well. This is often undesirable. As a remedy,
.B pure-gen
normally excludes built-in \fB#define\fPs of the C preprocessor, as well as
identifiers with a leading underscore (which are often found in system
headers) from processing. You can use the \fB-a\fP option to disable this, so
that all these symbols are included as well.
.PP
In addition, the \fB-s\fP and \fB-x\fP options enable you to filter C symbols
using the source filename and the symbol as search criteria. For instance, to
just generate code for a single header foo.h and none of the other headers
included in foo.h, you can invoke
.B pure-gen
as follows:
.sp
.nf
pure-gen -s foo.h:: foo.h
.fi
.sp
Note that even in this case all included headers will be parsed so that
\fB#define\fPd constants and enum values can be resolved, but the generated
output will only contain definitions and declarations from the given header
file.
.PP
In general, the \fB-s\fP option takes an argument of the form
\fIglob-patterns\fP::\fIregex-pattern\fP denoting a comma-separated list of
glob patterns to be matched against the source filename in which the symbol
resides, and an extended regex to be matched against the symbol itself. The
\fIglob-patterns\fP:: part can also be omitted in which case it defaults to
\(aq::\(aq which matches any source file. The regex can also be empty, in
which case it matches any symbol. The generated output will contain only the
constant and function symbols matching the given regex, from source files
matching any of the the glob patterns. Thus, for instance, the option
\(aq-s\ foo.h,bar.h::^(foo|bar)_\(aq pulls all symbols prefixed with either
\(aqfoo_\(aq or \(aqbar_\(aq from the files foo.h and bar.h in the current
directory.
.PP
Instead of \(aq::\(aq you can also use a single semicolon \(aq;\(aq to
separate glob and regex pattern. This is mainly for Windows compatibility,
where the msys shell sometimes eats the colons or changes them to \(aq;\(aq.
.PP
The \fB-x\fP option works exactly the same, but \fIexcludes\fP all matching
symbols from the selection. Thus, e.g., the option \(aq-x\ ^bar_\(aq causes
all symbols with the prefix \(aqbar_\(aq to \fInot\fP be included in the
output module.
.PP
Processing of glob patterns is performed using the customary rules for
filename matching, see \fBglob\fP(7) for details. Note that some include files
may be specified using a full pathname. This is the case, in particular, for
system includes such as \(aq#include <stdio.h>\(aq, which are resolved by the
C preprocessor employing a search of the system include directories (as well
as any directories named with the \fB-I\fP option).
.PP
Since the \fB*\fP and \fB?\fP wildcards never match the pathname separator
\(aq/\(aq, you have to specify the path in the glob patterns in such
cases. Thus, e.g., if the foo.h file actually lives in either /usr/include or
/usr/local/include, then it must be matched using a pattern like
\(aq/usr/include/*.h,/usr/local/include/*.h::\(aq. Just \(aqfoo.h::\(aq will
not work in this case. On the other hand, if you have set up your C sources in
some local directory then specifying a relative pathname is ok.
.SH NAME MANGLING
The \fB-s\fP option is often used in conjuction with the \fB-p\fP option,
which lets you specify a ``module name prefix'' which should be stripped off
from C symbols. Case is insignificant and a trailing underscore will be
removed as well, so \(aq-p\ foo\(aq turns \(aqfooBar\(aq into \(aqBar\(aq and
\(aqFOO_BAR\(aq into \(aqBAR\(aq. Moreover, the \fB-m\fP option allows you to
specify the name of a Pure namespace in which the resulting constants and
functions are to be declared. So, for instance, \(aq-s\ "^(foo|FOO)" -p\ foo
-m\ foo\(aq will select all symbols starting with the \(aqfoo\(aq or
\(aqFOO\(aq prefix, stripping the prefix from the selected symbols and finally
adding a \(aqfoo::\(aq namespace qualifier to them instead.
.SH GENERATING C CODE
As already mentioned, pure-gen can be invoked with the \fB-fc\fP or
\fB-fc-ffi\fP option to create a C wrapper module along with the Pure module
it generates. There are various situations in which this is preferable, e.g.:
.IP \- 3
You are about to create a new module for which you want to generate some
boilerplate code.
.IP \- 3
The C routines to be wrapped aren't available in a shared library, but in some
other form (e.g., object file or static library).
.IP \- 3
You need to inject some custom code into the wrapper functions (e.g., to
implement custom argument preprocessing or lazy dynamic loading of functions
from a shared library).
.IP \- 3
The C routines can't be called directly through Pure externs.
.PP
The latter case might arise, e.g., if the module uses non-C linkage or calling
conventions, or if some of the operations to be wrapped are actually
implemented as C macros. (Note that in order to wrap macros as functions
you'll have to create a staged header which declares the macros as C
functions, so that they are wrapped in the C module.
.B pure-gen
doesn't do this automatically.)
.PP
Another important case is that some of the C routines pass C structs by value
or return them as results. This is discussed in more detail in the following
section.
.PP
For instance, let's say that we want to generate a wrapper foo.c from the
foo.h header file whose operations are implemented in some library libfoo.a or
libfoo.so. A command like the following generates both the C wrapper and the
corresponding Pure module:
.sp
.nf
pure-gen -fc foo.h
.fi
.sp
This creates foo.pure and foo.c, with an import clause for "lib:foo" at the
beginning of the Pure module. (You can also change the name of the Pure and C
output files using the \fB-o\fP and \fB-c\fP options, respectively.)
.PP
The generated wrapper is just an ordinary C file which should be compiled to a
shared object (dll on Windows) as usual. E.g., using gcc on Linux:
.sp
.nf
gcc -shared -o foo.so foo.c -lfoo
.fi
.sp
That's all. You should now be able to use the foo module by just putting the
declaration \(aq\fBusing\fP foo;\(aq into your programs. The same approach
also works with the ffi interface if you replace the \fB-fc\fP option with
\fB-fc-ffi\fP.
.PP
You can also adjust the C wrapper code to some extent by providing your own
template file, which has the following format:
.sp
.nf
/* frontmatter here */
#include %h
%%

/* wrapper here */
%r %w(%p)
{
  return %n(%a);
}
.fi
.sp
Note that the code up to the symbol \(aq%%\(aq on a line by itself denotes
``frontmatter'' which gets inserted at the beginning of the C file. (The
frontmatter section can also be empty or missing altogether if you don't need
it, but usually it will contain at least an \fB#include\fP for the input
header file.)
.PP
The rest of the template is the code for each wrapper function. Substitutions
of various syntactical fragments of the function definition is performed using
the following placeholders:
.TP
%h
input header file
.TP
%r
return type of the function
.TP
%w
the name of the wrapper function
.TP
%p
declaration of the formal parameters of the wrapper function
.TP
%n
the real function name (i.e., the name of the target C function to be called)
.TP
%a
the arguments of the function call (formal parameters with types stripped off)
.TP
%%
escapes a literal %
.PP
A default template is provided if you don't specify one (which looks pretty
much like the template above, minus the comments). A custom template is
specified with the \fB-t\fP option. (There's also a \fB-T\fP option to specify
an ``alternate'' template for dealing with routines returning struct values,
see \fIDEALING WITH C STRUCTS\fP.)
.PP
For instance, suppose that we place the sample template above into a file
foo.templ and invoke
.B pure-gen
on the foo.h header file as follows:
.sp
.nf
pure-gen -fc -t foo.templ foo.h
.fi
.sp
Then in foo.c you'd get C output code like the following:
.sp
.nf
/* frontmatter here */
#include "foo.h"

/* wrapper here */
void Pure_foo(int arg0, void* arg1)
{
  return foo(arg0, arg1);
}

/* wrapper here */
int Pure_bar(int arg0)
{
  return foo(arg0);
}
.fi
.sp
As indicated, the wrapper function names are usually stropped with the
\(aqPure_\(aq prefix. You can change this with the \fB-P\fP option.
.PP
This also works great to create boilerplate code for new modules. For this
purpose the following template will do the trick:
.sp
.nf
/* Add #includes etc. here. */
%%

%r %n(%p)
{
  /* Enter code of %n here. */
}
.fi
.SH DEALING WITH C STRUCTS
Modern C compilers allow you to pass C structs by value or return them as
results from a C function. This represents a problem, because Pure doesn't
provide any support for that in its extern declarations. Even Pure's libffi
interface only has limited support for C structs (no unions, no bit fields),
and at present
.B pure-gen
itself does not keep track of the internal structure of C structs either.
.PP
Hence
.B pure-gen
will bark if you try to wrap an operation which passes or returns a C struct,
printing a warning message like the following which indicates that the given
function could not be wrapped:
.sp
.nf
Warning: foo: struct argument or return type, try -fc-auto
.fi
.sp
What Pure \fIdoes\fP know is how to pass and return \fIpointers\fP to C
structs in its C interface. This makes it possible to deal with struct
arguments and return values in the C wrapper. To make this work, you need to
create a C wrapper module as explained in the previous section. However, as C
wrappers are only needed for functions which actually have struct arguments or
return values, you can also use the \fB-fc-auto\fP option (or
\fB-fc-ffi-auto\fP if you prefer the ffi interface) to only generate the C
wrapper when required. This saves the overhead of an extra function call if
it's not actually needed.
.PP
Struct arguments in the original C function then become struct pointers in the
wrapper function. E.g., if the function is declared in the header as follows:
.sp
.nf
typedef struct { double x, y; } point;
extern double foo(point p);
.fi
.sp
Then the generated wrapper code becomes:
.sp
.nf
double Pure_foo(point* arg0)
{
  return foo(*arg0);
}
.fi
.sp
Which is declared in the Pure interface as:
.sp
.nf
extern double Pure_foo(point*) = foo;
.fi
.sp
Struct return values are handled by returning a pointer to a static variable
holding the return value. E.g.,
.sp
.nf
extern point bar(double x, double y);
.fi
.sp
becomes:
.sp
.nf
point* Pure_bar(double arg0, double arg1)
{
  static point ret;
  ret = bar(arg0, arg1); return &ret;
}
.fi
.sp
Which is declared in the Pure interface as:
.sp
.nf
extern point* Pure_bar(double, double) = bar;
.fi
.sp
(Note that the generated code in this case comes from an alternate template.
It's possible to configure the alternate template just like the normal one,
using the \fB-T\fP option instead of \fB-t\fP. See the \fIGENERATING C CODE\fP
section above for details about code templates.)
.PP
In a Pure script you can now call foo and bar as:
.sp
.nf
> foo (bar 0.0 1.0);
.fi
.sp
Note, however, that the pointer returned by \(aqbar\(aq points to static
storage which will be overwritten each time you invoke the \(aqbar\(aq
function. Thus in the following example \fIboth\fP u and v will point to the
same \(aqpoint\(aq struct, namely that defined by the latter call to
\(aqbar\(aq:
.sp
.nf
> let u = bar 1.0 0.0; let v = bar 0.0 1.0;
.fi
.sp
Which most likely is \fInot\fP what you want. To avoid this, you'll have to
take dynamic copies of returned structs. It's possible to do this manually by
fiddling around with malloc and memcpy, but the most convenient way is to
employ the struct functions provided by Pure's ffi module:
.sp
.nf
> using ffi;
> let point_t = struct_t (double_t, double_t);
> let u = copy_struct point_t (bar 1.0 0.0);
> let v = copy_struct point_t (bar 0.0 1.0);
.fi
.sp
Now u and v point to different, malloc'd structs which even take care of
freeing themselves when they are no longer needed. Moreover, the ffi module
also allows you to access the members of the structs in a direct
fashion. Please refer to the
.B pure-ffi
documentation for further details.
.SH NOTES
.B pure-gen
currently requires gcc (-E) as the C preprocessor. It also needs a version of
gcc which understands the
.B -fdirectives-only
option, which means gcc 4.3 or later. It will run with older versions of gcc,
but then you'll get an error message from gcc indicating that it doesn't
understand the -fdirectives-only option.
.B pure-gen
then won't be able to extract any \fB#define\fPd constants from the header
files.
.PP
.B pure-gen
itself is written in Pure, but uses a C parser implemented in Haskell, based
on the Language.C library written by Manuel Chakravarty and others.
.PP
.B pure-gen
can only generate C bindings at this time. Other languages may have their own
calling conventions which make it hard or even impossible to call them
directly through Pure's extern interface. However, if your C compiler knows
how to call the other language, then it may be possible to interface to
modules written in that language by faking a C header for the module and
generating a C wrapper with a custom code template, as described in
\fIGENERATING C CODE\fP. In principle, this approach should even work with
behemoths like C++, although it might be easier to use third-party tools like
SWIG for that purpose.
.PP
In difference to SWIG and similar tools,
.B pure-gen
doesn't require you to write any special ``interface files'', is controlled
entirely by command line options, and the amount of marshalling overhead in C
wrappers is negligible. This is possible since
.B pure-gen
targets only the Pure-C interface and Pure has good support for interfacing to
C built into the language already.
.PP
.B pure-gen
usually works pretty well if the processed header files are written in a
fairly clean fashion. Nevertheless, some libraries defy fully automatic
wrapper generation and may thus require staged headers and/or manual editing
of the generated output to get a nice wrapper module.
.PP
In complex cases it may also be necessary to assemble the output of several
runs of
.B pure-gen
for different combinations of header files, symbol selections and/or
namespace/prefix settings. In such a situation it is usually possible to just
concatenate the various output files produced by
.B pure-gen
to consolidate them into a single wrapper module. To make this easier,
.B pure-gen
provides the \fB-N\fP a.k.a. \fB--noclobber\fP option which appends the output
to existing files instead of overwriting them. See the example below.
.SH EXAMPLE
For the sake of a substantial, real-world example, here is how you can wrap
the entire GNU Scientific Library in a single Pure module mygsl.pure, with the
accompanying C module in mygsl.c:
.sp
.nf
rm -f mygsl.pure mygsl.c
DEFS=-DGSL_DISABLE_DEPRECATED
for x in /usr/include/gsl/gsl_*.h; do
  pure-gen $DEFS -N -fc-auto -s "$x::" $x -o mygsl.pure -c mygsl.c
done
.fi
.sp
The C module can then be compiled with:
.sp
.nf
gcc $DEFS -shared -o mygsl.so mygsl.c
.fi
.sp
Note that the GSL_DISABLE_DEPRECATED symbol must be defined here to avoid some
botches with constants being defined in incompatible ways in different GSL
headers. Also, some GSL versions have broken headers lacking some system
includes which causes hiccups in
.BR pure-gen 's
C parser. Fixing those errors or working around them through some appropriate
cpp options should be a piece of cake, though. ;-)
.SH LICENSE
BSD-like. See the accompanying COPYING file for details.
.SH AUTHORS
Scott E. Dillard (University of California at Davis), Albert Graef (Johannes
Gutenberg University at Mainz, Germany).
.SH SEE ALSO
.TP
.BR pure (1)
.TP
.B Language.C
A C parser written in Haskell by Manuel Chakravarty et al,
\fIhttp://www.sivity.net/projects/language.c\fP.
.TP
.B SWIG
The Simplified Wrapper and Interface Generator, \fIhttp://www.swig.org\fP.
